Step 1: Loading & Exploring Datasets

Study Area & H3 Resolution

We will use the entire United States as our study area.

This analysis will be done using H3 hexagons to join these datasets together. Each of these datasets originally come as rasters. We've ingested them to H3 hexagons, though we need to decide on a H3 resolution to perform our analysis on.

Here is a comparison of different H3 resolutions over the same study area:

Area of Interest	H3 Hex Resolution	Aver. H3 Cell Size	# H3 Cells in Area
Continental United States*	4	~1,770 km²	~10,000
Continental United States*	5	~252 km²	~70,000
Continental United States*	6	~36 km²	~49,000

United States being defined by the bounds: [-130, 25, -60, 50], i.e. approximately 17.6M km² (US is about 9.8M km²)

See the H3 Cell Counts Stats for more details

We'll decide to use a H3 resolution of 5 for this analysis.

H3 rule of thumb

Going up a resolution by 1 level reduces the area of each hexagon by a factor of 7. This means that at resolution 5, each hexagon is 7x7 = 49x smaller than at resolution 3.

Keep this in mind as more hexagons also require longer compute time.

Getting number of H3 cells for bounds and resolution

We can write a simple UDF to estimate the number of H3 cells for a given bounds and resolution.

@fused.udf
def udf(bounds: fused.types.Bounds = [-130, 25, -60, 50], res: int = 5):
    
    # Load common library for H3 utilities
    common = fused.load("https://github.com/fusedio/udfs/tree/56ec615/public/common/")
    
    # Convert bounds to GeoDataFrame and calculate UTM area
    gdf_bounds = common.bounds_to_gdf(bounds)
    gdf_bounds = common.add_utm_area(gdf_bounds)
    area_millions = round(gdf_bounds['utm_area_sqm'].iloc[0] / 1e12, 1)
    print(f"Bounds area: {area_millions}M square km")
    
    # Get H3 hexagons within bounds at specified resolution
    df_hex = common.bounds_to_hex(bounds=bounds, res=res, hex_col="hex")
    print(f"{df_hex.T}")
    
    return df_hex

Returns:

>>> Bounds area: 17.6M square km
>>>               0      ...               67634
hex  599309241731252223  ...  600345878259040255

[1 rows x 67635 columns]

This result is 17.6M km², i.e. about 1.8 times more than the actual area of the United States (9.8M km²).

This is because the bounds is a rectangle around the United States, not the exact boundary of the United States:

Data Ingestion: Raster to Hex

There are a few options for ingesting raster data to H3 hexagons

Method	Processing Method	Usability	How To
Single File	Area of Interest in 1 UDF	Recompute required for every new area / resolution	See Example UDF: DEM_Tile_Hexify (Set to Single UDF Mode)
Dynamic Tile	Area of Interest in parallel UDFs	Recompute required for every new area / resolution	See H3 Tiling Docs Section
Full Ingestion	Only once	Read only	Reach out to Fused (`info@fused.io`)

The rest of this example assumes datasets are already in some hex format.

Dataset 1: Crops

Data

Original Raster Dataset from CroplandCROS
Hex partitioned dataset by Fused available on Source Cooperative

For simplicity we already created a hex dataset in H3 cell size 5 for 2024: s3://fused-asset/misc/hex/CDL_h12k1p1/year=2024/overview/hex5.parquet

Reading data in UDF

Load the hex dataset using DuckDB from a .parquet on S3

# cdl_data_all_crops
@fused.udf
def udf():
    import duckdb

    df = duckdb.query(f"""
        SELECT * 
        FROM 's3://fused-asset/misc/hex/CDL_h12k1p1/year=2024/overview/hex5.parquet'
        and area>1000
    """).to_df()

    df = df.rename(columns={'data': 'crop_type'})

    return df

Returns:

>>>      0             1       ...        706262        706263
crop_type   0.000000e+00  0.000000e+00  ...  0.000000e+00  0.000000e+00
hex         5.992301e+17  5.992302e+17  ...  6.003962e+17  6.003962e+17
area        6.128944e+07  2.849653e+08  ...  2.162687e+07  2.917735e+08
chunk_id    5.767422e+17  5.767422e+17  ...  5.779033e+17  5.779033e+17
year        2.024000e+03  2.024000e+03  ...  2.024000e+03  2.024000e+03

This UDF:

Loads all the data as hex cells
Only keeps cells with an area greater than 1000 square kilometers (filters out smallest crops types)
Returns 1 row per crop type per cell (so each H3 cell can have multiple rows)

Visualizing a single crop type

Fused UDFs are serverless Python functions, so we can also create visualizations with them. We've built a simple UDF library to render data on maps.

We can create a new UDF that loads the data and filters for a specific crop type (you can see the CDL Crop codes here)

Code: Map UDF for a single crop

@fused.udf
def udf(
    crop_type=1 # Crop 1 = Corn
):
    utils = fused.load("https://github.com/fusedio/udfs/tree/6dd2c4e/community/milind/map_utils")

    data = fused.run("cdl_data_all_crops")
    # Keeping only the specific crop_type selected
    data = data[data['crop_type'] == crop_type]

    # keeping subset of cols so tooltip only shows these
    data = data[['crop_type', 'hex', 'area']]
    
    print(data)
    config = {
        "initialViewState": {
            "longitude": -95.7129,
            "latitude": 37.0902,
            "zoom": 3
        },
        "hexLayer": {
            "@@type": "H3HexagonLayer",
            "filled": True,
            "pickable": True,
            "extruded": False,
            "getHexagon": "@@=properties.hex",
            "getFillColor": {
                "@@function": "colorContinuous",
                "attr": "area",
                "domain": [0, 30_000_000],
                "steps": 20,
                "colors": "Peach"
            }
        }
    }
    return utils.deckgl_hex(
        data, 
        config=config
    )

This returns a standalone UDF:

Visualizing multiple crop types

In Fused Canvas we can create different visualization of the same UDF simply by filtering for a specific crop type. We'll create:

Our original cdl_data_all_crops UDF loading all crops types
A UDF returning a map of crop_type = 1 (Corn)
A UDF returning a map of crop_type = 3 (Rice)
A UDF returning a map of crop_type = 111 (Open Water)

Canvas →

Explore this Fused Canvas for yourself:

Link to live canvas
Click the "Download Canvas" button to download the canvas .zip file and upload it directly in your Workbench

Dataset 2: Elevation

Data

Original Raster tiles from Terrain Tiles
Hex partitioned dataset by Fused: s3://fused-asset/data/dem/r6_z4_100k.parquet (See on File Explorer)

Reading Data

The .parquet file we'll use isn't at hex 5 resolution, so we will dynamically:

Use h3_cell_to_parent(hex, 5) as hex, to convert the hex to the parent hex at resolution 5
Use avg(mean_value) as mean_value to aggregate by average elevation of all the elevation values covering the parent hex
Keep only values within the bounds viewport with h3_cell_to_lng(hex)

@fused.udf
def udf(
    bounds:fused.types.Bounds=[-130, 25, -60, 50]
):
    # Loading the common Fused hex library to access pre-installed H3 library in DuckDB
    common = fused.load("https://github.com/fusedio/udfs/tree/56ec615/public/common/")
    con = common.duckdb_connect() 

    query = """
        SELECT  
            h3_cell_to_parent(hex, 5) as hex, 
            avg(mean_value) as mean_value 
        FROM read_parquet('s3://fused-asset/data/dem/r6_z4_100k.parquet')
        where 1=1
        and h3_cell_to_lng(hex) between ? and ? 
        and h3_cell_to_lat(hex) between ? and ? 
        group by 1
    """
    data = con.execute(query, [bounds[0], bounds[2], bounds[1], bounds[3]]).df()

    return data

Returns:

                   0             1      ...         68042         68043
hex         5.992300e+17  5.992301e+17  ...  5.996693e+17  5.996693e+17
mean_value  3.876500e+02  2.390833e+02  ...  1.530952e+02  1.819881e+02

Visualizing Elevation

We can visualize the elevation data similarly to the crop data:

Show code details

@fused.udf
def udf():
utils = fused.load('team/map_utils')
data = fused.run("elevation_hex_mean_height")
print(data.T)

config = {
    "initialViewState": {
        "longitude": -95.7129,
        "latitude": 37.0902,
        "zoom": 3
    },
    "hexLayer": {
        "@@type": "H3HexagonLayer",
        "filled": True,
        "pickable": True,
        "extruded": False,
        "getHexagon": "@@=properties.hex",
        "getFillColor": {
            "@@function": "colorContinuous",
            "attr": "mean_value",
            "domain": [0, 3000], # Rough elevation range in meters in the US
            "steps": 20,
            "colors": "Earth"
        }
    }
}
return utils.deckgl_hex(
    data, 
    config=config
)

Joining the two together on a Fused Canvas:

Canvas →

Explore the Fused Canvas:

Link to live canvas
Click the "Download Canvas" button to download the canvas .zip file and upload it directly in your Workbench

Dataset 3: Temperature

Data

Original Raster tiles from ERA5 Climate Reanalysis
Hex partitioned dataset by Fused: s3://fused-asset/data/era5/t2m_daily_mean_v2_1000/ (See on File Explorer)

Reading Data

We'll create a UDF that takes:

A given year & month as YYYY-MM
Bounds list

And returns:

Average temperature for any given hexagon over that month

We can load the data using DuckDB and visualize it using the deckgl_hex function:

@fused.udf
def udf(month='2024-05',
    bounds:fused.types.Bounds=[-130, 25, -60, 50]
):
    path = f's3://fused-asset/data/era5/t2m_daily_mean_v2_1000/month={month}/0.parquet'
    
    common = fused.load("https://github.com/fusedio/udfs/tree/56ec615/public/common/")
    con = common.duckdb_connect() 

    query = """
        SELECT  
            hex, 
            avg(daily_mean) - 273.15 as daily_mean 
        FROM read_parquet(?)
        WHERE 1=1
        AND h3_cell_to_lng(hex) between ? and ? 
        AND h3_cell_to_lat(hex) between ? and ? 
        GROUP BY 1
    """

    data = con.execute(
        query,
        [path, bounds[0], bounds[2], bounds[1], bounds[3]]
    ).df()

    print(f"{data.T=}")

    return data

Note a few things here:

The dataset is not in hex 5, this is a different resolution than the other datasets we've been working with.
The data has a lot of "empty" areas.

This is because the underlying data, the ERA5 dataset is a much coarser spatial resolution than the other 2 datasets (around ~31km at the equator, compared to 30m for the Crop Data Layer and 5m for the Elevation data)

We can smooth the data to the hex 5 resolution using a k-ring neighborhood aggregation:

@fused.udf
def udf(month='2024-05',
    bounds:fused.types.Bounds=[-130, 25, -60, 50]
):
    common = fused.load("https://github.com/fusedio/udfs/tree/56ec615/public/common/")
    import pandas as pd  
    path = f's3://fused-asset/data/era5/t2m_daily_mean_v2_1000/month={month}/0.parquet'
 
    con = common.duckdb_connect() 

    query = """
        WITH base_data AS (
            SELECT  
                h3_cell_to_parent(hex, 5) as hex, 
                avg(daily_mean) - 273.15 as daily_mean 
            FROM read_parquet(?)
            WHERE 1=1
            AND h3_cell_to_lng(hex) between ? and ? 
            AND h3_cell_to_lat(hex) between ? and ? 
            GROUP BY 1
        ),
        all_hexes AS (
            -- Get all hexes including neighbors by expanding k-ring
            SELECT DISTINCT unnest(h3_grid_disk(hex, 1)) as hex
            FROM base_data
        ),
        smoothed AS (
            -- For each hex (original + neighbors), get its k-ring neighbors
            SELECT 
                ah.hex as hex,
                unnest(h3_grid_disk(ah.hex, 1)) as neighbor_hex
            FROM all_hexes ah
        )
        SELECT 
            s.hex,
            round(avg(b.daily_mean), 2) as daily_mean
        FROM smoothed s
        JOIN base_data b ON b.hex = s.neighbor_hex
        GROUP BY s.hex
    """ 

    data = con.execute(
        query,
        [path, bounds[0], bounds[2], bounds[1], bounds[3]]
    ).df()

    print(f"{data.T=}")

    return data

Visualizing a single month

As above we can visualize the temperature data across a single month:

Visualizing multiple months

We can create a Fused Canvas to visualize data of different months at once. We create a 'recap' Canvas to show, from left to right:

Original dataset at original hex resolution
Convert to parent hex at resolution 5
Smooth the data to the hex 5 resolution -> Visualize 3 different months at once

Canvas →

Explore the Fused Canvas:

Link to live canvas
Click the "Download Canvas" button to download the canvas .zip file and upload it directly in your Workbench

Summary & Next Steps

We now have each dataset at the same hex5 resolution over the same bounds of the United States. Next:

Study Area & H3 Resolution​

Data Ingestion: Raster to Hex​

Dataset 1: Crops​

Reading data in UDF​

Visualizing a single crop type​

Visualizing multiple crop types​

Dataset 2: Elevation​

Reading Data​

Visualizing Elevation​

Dataset 3: Temperature​

Reading Data​

Visualizing a single month​

Visualizing multiple months​

Summary & Next Steps​

Study Area & H3 Resolution

Data Ingestion: Raster to Hex

Dataset 1: Crops

Reading data in UDF

Visualizing a single crop type

Visualizing multiple crop types

Dataset 2: Elevation

Reading Data

Visualizing Elevation

Dataset 3: Temperature

Reading Data

Visualizing a single month

Visualizing multiple months

Summary & Next Steps